Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update type-parser for v2 #1514

Merged
merged 13 commits into from
Jun 23, 2022
Merged

Update type-parser for v2 #1514

merged 13 commits into from
Jun 23, 2022

Conversation

jpivarski
Copy link
Member

A more proper fix than #1513. (See continuing issues in scikit-hep/uproot5/pull/620.)

@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #1514 (642e6b9) into main (0c6ae03) will decrease coverage by 2.66%.
The diff coverage is 84.89%.

Impacted Files Coverage Δ
...awkward/_v2/_connect/rdataframe/from_rdataframe.py 0.00% <0.00%> (ø)
...c/awkward/_v2/_connect/rdataframe/to_rdataframe.py 0.00% <0.00%> (ø)
src/awkward/_v2/types/_awkward_datashape_parser.py 47.66% <ø> (ø)
src/awkward/_v2/_connect/cling.py 25.99% <9.09%> (-0.35%) ⬇️
src/awkward/_v2/types/recordtype.py 87.06% <91.66%> (+0.83%) ⬆️
src/awkward/_v2/types/type.py 96.46% <96.10%> (-1.27%) ⬇️
src/awkward/_v2/_lookup.py 97.51% <100.00%> (+0.01%) ⬆️
src/awkward/_v2/types/__init__.py 100.00% <100.00%> (ø)
src/awkward/_v2/types/regulartype.py 90.47% <0.00%> (+4.76%) ⬆️

@jpivarski jpivarski marked this pull request as ready for review June 23, 2022 03:31
@jpivarski
Copy link
Member Author

The two tests that are failing are failing in a dtype merger test:

  _______________________________ test_numpyarray ________________________________
  
      def test_numpyarray():
          for dtype1 in ("i1", "i2", "i4", "i8", "u1", "u2", "u4", "u8", "f4", "f8", "?"):
              for dtype2 in ("i1", "i2", "i4", "i8", "u1", "u2", "u4", "u8", "f4", "f8", "?"):
                  for dtype3 in (
                      "i1",
                      "i2",
                      "i4",
                      "i8",
                      "u1",
                      "u2",
                      "u4",
                      "u8",
                      "f4",
                      "f8",
                      "?",
                  ):
                      for dtype4 in (
                          "i1",
                          "i2",
                          "i4",
                          "i8",
                          "u1",
                          "u2",
                          "u4",
                          "u8",
                          "f4",
                          "f8",
                          "?",
                      ):
                          one = np.array([0, 1, 2], dtype=dtype1)
                          two = np.array([3, 0], dtype=dtype2)
                          three = np.array([], dtype=dtype3)
                          four = np.array([4, 5, 0, 6, 7], dtype=dtype4)
                          combined = np.concatenate([one, two, three, four])
  
                          ak_combined = ak.layout.NumpyArray(one).mergemany(
                              [
                                  ak.layout.NumpyArray(two),
                                  ak.layout.NumpyArray(three),
                                  ak.layout.NumpyArray(four),
                              ]
                          )
  
                          assert ak.to_list(ak_combined) == combined.tolist()
  >                       assert ak.to_numpy(ak_combined).dtype == combined.dtype
  E                       assert dtype('float64') == dtype('float32')
  E                        +  where dtype('float64') = array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.]).dtype
  E                        +    where array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.]) = <function to_numpy at 0x7f997209cee0>(<NumpyArray format="d" shape="10" data="0 1 2 3 0 4 5 0 6 7" at="0x7f997322b980"/>)
  E                        +      where <function to_numpy at 0x7f997209cee0> = ak.to_numpy
  E                        +  and   dtype('float32') = array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.], dtype=float32).dtype
  
  ak_combined = <NumpyArray format="d" shape="10" data="0 1 2 3 0 4 5 0 6 7" at="0x7f997322b980"/>
  combined   = array([0., 1., 2., 3., 0., 4., 5., 0., 6., 7.], dtype=float32)
  dtype1     = 'i1'
  dtype2     = 'i1'
  dtype3     = 'u2'
  dtype4     = 'f4'
  four       = array([4., 5., 0., 6., 7.], dtype=float32)
  one        = array([0, 1, 2], dtype=int8)
  three      = array([], dtype=uint16)
  two        = array([3, 0], dtype=int8)
  
  work/awkward/awkward/tests/test_0449-merge-many-arrays-in-one-pass.py:52: AssertionError

They use numpy-1.23.0-cp38-cp38-macosx_10_9_x86_64.whl and numpy-1.23.0-cp39-cp39-win_amd64.whl, respectively, but other tests run NumPy 1.23.

In particular, the latest Azure MacOS runs numpy-1.23.0-cp310-cp310-macosx_10_9_x86_64.whl and the latest Azure Windows runs numpy-1.23.0-cp310-cp310-win_amd64.whl.

Hmmm. The test failure is in v1, which has to manually duplicate what NumPy does. v2 probably uses NumPy's own dtype promotion algorithm directly (i.e. concatenate two empty arrays of the appropriate dtype and see what the result is). Actually, it's not even that: it directly asks NumPy to concatenate them:

contiguous_arrays = self._nplike.concatenate(contiguous_arrays)
next = NumpyArray(contiguous_arrays, self._identifier, parameters, self._nplike)

So even if NumPy changes their promotion rules, v2 should be insensitive to that change (it should change along with them). But those actual tests are commented out!

assert to_list(ak_combined) == combined.tolist()
assert ak_combined.dtype == combined.dtype
# assert ak._v2.contents.NumpyArray(one).typetracer.mergemany(
# [
# ak._v2.contents.NumpyArray(two),
# ak._v2.contents.NumpyArray(three),
# ak._v2.contents.NumpyArray(four),
# ]
# ).form == ak._v2.contents.NumpyArray(one).mergemany(
# [
# ak._v2.contents.NumpyArray(two),
# ak._v2.contents.NumpyArray(three),
# ak._v2.contents.NumpyArray(four),
# ]
# ).form
ak_combined = ak._v2.contents.NumpyArray(one).mergemany(
[
ak._v2.contents.NumpyArray(two),
ak._v2.contents.EmptyArray(),
ak._v2.contents.NumpyArray(four),
]
)
assert to_list(ak_combined) == combined.tolist()
assert ak_combined.dtype == np.concatenate([one, two, four]).dtype
# assert ak._v2.contents.NumpyArray(one).typetracer.mergemany(
# [
# ak._v2.contents.NumpyArray(two),
# ak._v2.contents.EmptyArray(),
# ak._v2.contents.NumpyArray(four),
# ]
# ).form == ak._v2.contents.NumpyArray(one).mergemany(
# [
# ak._v2.contents.NumpyArray(two),
# ak._v2.contents.EmptyArray(),
# ak._v2.contents.NumpyArray(four),
# ]
# ).form

I should do another PR to uncomment those and make sure they work. I don't care too much about whether v1 only agrees with a range of NumPy versions, it's about to become irrelevant soon. So that PR might drop the v1 test, while enabling the v2 test.

@jpivarski jpivarski merged commit 4ba1dbb into main Jun 23, 2022
@jpivarski jpivarski deleted the jpivarski/update-type-parser-for-v2 branch June 23, 2022 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant